Overview of operations continued

Category: Transform Datasets

Operation name

Operation function

Aggregate

Perform mathematical operations such as finding the average value, the standard deviation and row count, on a group of numbers. Integer or string fields can be selected for grouping, and the required aggregate operation is performed on these grouped fields.

Copy

Create an exact copy of the dataset.

Reverse

Reverse the order of the values of the dataset. This operation requires no configuration, by clicking on the operation, a new dataset is created, with the order of the values reversed.  

Convert Type

Change the data types of the fields in the dataset. Valid data types are integer, double, string and date/time. The fields not converted can either still be included in the resulting dataset, or ignored and excluded from the resulting dataset.

Sort

Prioritize the dataset fields, and then sort the values in the rows of the highest priority field in an ascending or descending order. The corresponding rows of the other dataset fields are moved according to the sorting criteria of the highest priority field. Where there are rows containing identical values in the highest priority field, the sorting criteria of the second highest priority field is considered. It is therefore important to ensure that the fields of greatest priority are listed highest of the selected fields, and that the ascending and descending order requested is stipulated for each field.

Transpose

Transpose data for summarization - it produces meaningful information from a table of information. While the structure of the original dataset is not changed in any way, transposing using pivot tables can automatically sort, count, and sum the data stored in one table and create a second table (called the "pivot table") to present the summarized data. The pivot table presents several kinds of aggregations including: sum, average, standard deviation, and count for example.

Cross correlation

Automatically calculate the lag required for the highest correlation between a change in a field value and the effect seen on the process target field. The lag is calculated either over time or over the row indexes.

Correlation

Create a correlation matrix of selected fields. The correlation matrix can be created using either an index or the timestamp as a base for the correlation calculation. Correlation is represented as a number, indicating the strength of a linear relationship between two random variables. This operation will create a new dataset, containing the correlation matrix.

Delay

Create a dataset with delayed data values for selected fields. These delays can be implemented on either the row number or as a number of seconds . Multiple fields can be delayed, each by different values.

ISV Action Object

Reuse the functionality of an existing data manipulation blueprint created within the Architect. Use this operation to write new data as the source for the Action Object blueprint, by mapping the new data fields to those required within the Action Object. The functionality of each block will then be applied to the new data. The blueprint outputs will no longer be sent to a sink block within the Architect, rather they will be saved as a new dataset within the s.  This operation will either use the timestamp field of the dataset, or will create a timestamp field for the Action Object. Bad quality fields will be recorded in the Action Object with empty values. Note that in order for this operation to work, all the blocks used to create the Action Object blueprint within the Architect need to be licensed and registered to you. If they are not registered the operation will fail.

Join Timestamps

Merge two or more timestamp fields into one timestamp field, with all timestamps listed chronologically. The data values of other fields are still listed at their original timestamps, and can either be repeated for the additional timestamps in the joined timestamp field, or be interpolated to generate values for the additional timestamps. This operation creates a new dataset, with new fields. These new fields contain both original data values as well as repeated or interpolated values corresponding to the additional timestamps. Original fields need to be mapped to the selected timestamp fields during configuration.

Moving Statistics

Calculate statistics of the values of each field over a specified window. This window will move across rows of data, with the statistics being calculated either over a number of rows or over a specified time period.

Resample

Resampling will create a new dataset with new start and end times, and a different sampling period. The values of the dataset will not be changed.

Statistics

Calculate statistics for selected fields.  Statistics can be calculated across data rows or over a time span, where the average between two consecutive timestamps is used. This operation will not affect the original dataset in any way. A new dataset will be created listing the statistics for the selected fields.

Category: Filter Datasets

Operation name

Operation function

Limit Values

Filter out values in your data by applying upper and/or lower limits. These limits can be different for each selected field.

Empty Values

Filter out the empty values from the selected fields, and create a new dataset where the rows that previously contained the empty values have been removed.

Timestamps

Filter out specific fields based on their timestamps by applying an upper and/or lower limit. This will create a new dataset that contains a field of only timestamps that fall within the limits specified.

SQL Expression

Filter selected fields by defining a WHERE clause in the SQL expression, indicating the conditions that need to be met in order for the values to be included in the new dataset. Only the fields selected will be listed in the new dataset.

 

Category: Combine Datasets

Operation name

Operation function

Horizontal

Combine two datasets side by side. Choose the row numbers at which each dataset will align when merged. There is an option to only merge rows where both datasets have values at the same row number. This eliminates rows which only contain data from one dataset. This operation is not dependant on the timestamp field in any way. These timestamp fields are treated as any other field, and will be listed next to each other in the new dataset. The timestamp fields will not influence the order of the values of the other fields in any way.

Relational

Merge two datasets at the points where the values of selected fields in one dataset are the same as the values of selected fields in the second dataset. Then filter how the resulting dataset will be presented by choosing to include all the merged fields, only fields from both datasets with rows sharing common values, only fields from both datasets where rows sharing common values are excluded, or unmatched rows from either both datasets, only the first dataset or only the second dataset. This enables creating a combined dataset, listing only the required relevant data.

Timestamp Range Merge

Merge two datasets, combining only the data values that fall within a defined timestamp range. The new dataset will contain the timestamps that fall within the specified timestamp range and the corresponding data values of the merged datasets.

Vertical

Merge two datasets one below the other. A new dataset will be created with the left hand dataset at the top of the newly created dataset, and the right hand dataset underneath this. Select the fields from the two datasets to be included in the new dataset, then map the selected fields from each dataset to fields of the same type. This operation is not dependant on the timestamp field, and will not merge the timestamp fields into one chronological sequence. The resulting timestamp field will thus contain the timestamps of the first dataset, followed by the timestamps of the second dataset.

Time

Merge two datasets, retaining only selected fields from each dataset. The new dataset will contain only one timestamp field, which is a combination of the timestamps of the two different datasets. Only one timestamp will be used when the two datasets have overlapping timestamps and both values of each original dataset will be listed. Where there are timestamps that did not exist in the original dataset, the last known value is used, or it is left empty if there is no prior value. Select to ignore empty timestamps in the new dataset, or select for the operation to fail. This will ensure that you have a dataset where there are no empty values in the timestamp field.

Category: Export Datasets

Operation name

Operation function

Text File

Select the dataset to export, and define the file name, file location and delimiter to use. Save the exported file as a .csv or .txt file.

Sink Block

Export any dataset to a sink block such as a text sink, Optimised Database-, and Tabular Database sink. This sink block transfers data to a source block for use in other applications. Select which fields will be exported and configure timestamps and data qualities for these fields.

.NET Wrapper

Sink a dataset to a .NET object that has been created in a separate application, prior to using this operation. This sink object will be a .NET assembly library file (dll). Using the .NET object will allow for reuse and  interchangeability of code and datasets between different programmes. The .NET object previously created will determine in what format the dataset is exported, and will determine how the dataset needs to be configured in order to be utilized by the .NET object.  

 


 

Related topics:

  

CSense 2023- Last updated: November 17,2023